Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts

نویسندگان

  • Kwangcheol Shin
  • Sang-Yong Han
  • Alexander F. Gelbukh
چکیده

MEDLINE is a widely used very large database of abstracts of research papers in medical domain. Abstracts in it are manually supplied with keywords from a controlled vocabulary called MeSH. The MeSH keywords assigned to a specific document are subdivided into MeSH major headings, which express the main topic of the document, and MeSH minor headings, which express additional information about the document’s topic. The search engine supplied with MEDLINE uses Boolean retrieval model with only MeSH keywords used for indexing. We show that (1) vector space retrieval model with the full text of the abstracts indexed gives much better results; (2) assigning greater weights to the MeSH keywords than to the terms appearing in the text of the abstracts gives slightly better results, and (3) assigning slightly greater weight to major MeSH terms than to minor MeSH terms further improves the results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and Implementation of Automatic Indexing for Information Retrieval with Arabic Documents

We have put together a corpus of 242 abstracts of Arabic has been stimulated by the D.O.D. Tipster project (Hardocuments using the Proceedings of the Saudi Arabian man, 1993). Arabic provides a very different context National Conferences as a source. All these abstracts from English, since it is a non-Indo-European language involve computer science and information systems. We with a complex mor...

متن کامل

Bibliographic database access using free-text and controlled vocabulary: an evaluation

This paper evaluates and compares the retrieval effectiveness of various search models, based on either automatic text-word indexing or on manually assigned controlled descriptors. Retrieval is from a relatively large collection of bibliographic material written in French. Moreover, for this French collection we evaluate improvements that result from combining automatic and manual indexing. Fir...

متن کامل

Bilingual Indexing for Information Retrieval with AUTINDEX

AUTINDEX is a bilingual automatic indexing system for the two languages German and English. It is being developed within the EU-funded BINDEX project. The aim of the system is to automatically index large quantities of abstracts of scientific and technical papers from several areas of engineering. Automatic indexing takes place using a controlled vocabulary provided in monolingual and bilingual...

متن کامل

Annotation of Chest Radiology Reports for Indexing and Retrieval

Annotation of MEDLINE citations with controlled vocabulary terms improves the quality of retrieval results. Due to variety in descriptions of similar clinical phenomena and abundance of negation and uncertainty, annotation of clinical radiology reports for subsequent indexing and retrieval with a search engine is even more important. Provided with an opportunity to add about 4,000 radiology rep...

متن کامل

میزان همخوانی کلیدواژه‌های مستخرج از چکیده با توصیفگرهای نمایه‌سازان در پایگاه «چکیده پایان‌نامه‌های ایران»

Purpose: This research is devoted to study the consistency between keywords extracted from abstracts of theses by the experts in the related fields and descriptors provided by the indexers in database of “Iran’s theses abstracts”. Methodology: This research is an applied study based on content analysis. A checklist which consisted of 32 criteria was used. In addition, we consulted the experts ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004